Intro

Here we document the development Atlantis diagnostics code to determine whether the model is meeting define performance and review criteria. Code we be developed to evaluate the model output against detailed performance criteria developed in [1] and the calibration criteria in [2]:

  1. All functional groups persist

  2. Model stabilizes for the last ~20 years of an unfished, unperturbed 80-100 year run

  3. Details for calibration:

    • Unfished system:
      • produce stable biomass through time under stable-environmental forcing
      • keep all groups from going extinct
      • obtain stable biomasses that are not oscillating more than a certain percentage from initial values
    • System with constant fishing: reproduce realistic responses to a range of fishing pressures
  4. Hindcast period established where we have survey/assessment time series with error bounds

  5. Species groups totaling ~80% of system biomass should qualitatively match hindcast biomass trends. During calibration: reproduce historical trends in biomass when forcing the model with historical fishing and environmental drivers.

  6. Patterns of temporal variability captured (emergent or forced with e.g. recruitment time series)

  7. Productivity for groups totaling ~80% of system biomass should qualitatively match FMSY or life history expectations

  8. Natural mortality decreases with age for majority of groups

  9. Age and length structure qualitatively matches expectations for majority of groups

  10. Diet predicted qualitatively matches empirical diet comp for majority of groups

An R function for each criterion is developed below, and a wrapper that runs all functions will be developed and tested here.

We use these R libraries, with non-CRAN package installation instructions in comments.

#remotes::install_github(\"noaa-edab/ecodata\",build_vignettes=TRUE)
#remotes::install_github(\"r4atlantis/atlantisom\",build_vignettes=TRUE)
#remotes::install_github(\"noaa-edab/ecotrend\",build_vignettes=TRUE)
#remotes::install_github("andybeet/atlantisdrive")

library(ecodata)
library(ecotrend)
library(tidyverse)
library(atlantisom)
library(here)
library(googledrive)
library(atlantisdrive)

source("shift_legend.R")

Configure files to be read in here (can have different files to source).

January 2021: source output from google folder EDABranch_Drive > Atlantis > Shared Data or EDABranch_Drive > Atlantis > Shared Data where each set of outputs in its own folder. Source input files from this github repo in folder here("currentVersion").

UPDATE: If files are pushed to google using atlantisdrive then all relevant input and output files will be in the same google folder and will download to the temp folder under diagnostics (or wherever the user directs it).

So the function below will set everything up using this DiagnosticConfig.R file, which loads all files using atlantisdrive:

# CHECK ALL FILENAMES FOR YOUR MODEL

# where are files on the atlantis google drive? 
g.name <- "Testing/OutForSarah"

# which local directory should they write to?
d.name <- here::here("diagnostics", "temp")

# input file names
functional.groups.file <- "neus_groups.csv"  
biomass.pools.file <- "neus_init.nc"
box.file <- "neus_tmerc_RM2.bgm"
initial.conditions.file <- "neus_init.nc"
fisheries.file <- "neus_fisheries.csv"
biol.prm.file <- "at_biology.prm"


# define the hindcast period--these are survey years
hindcast <- c(1980:2010)

The function diag_init takes the user’s config file as input, brings all Atlantis files into a local directory from google drive, and uses atlantisom functions to make a list of object including species names, functional group attributes, model boxes, run parameters, and biological parameters, and the biomass and catch time series outputs from .txt files. More can be added to this as needed.

diag_init <- function(config_file){
  
  source(config_file)
  
  # pull all files from google
  atlantisdrive::pull_from_drive(d.name,
                                 fileList = NULL,
                                 googledriveFolder = g.name)
  
  # get scenario name from a file inside folder: SSB is unique
  scenario.name <- gsub("SSB.txt","",list.files(path=d.name, pattern = "*SSB.txt"))
  
  # output file names
  run.prm.file <- list.files(path=d.name, pattern = "*at_run*.xml")
  bioind.file <- paste0(scenario.name, "BiomIndx.txt")
  catch.file <- paste0(scenario.name, "Catch.txt")
  
  
  #Load functional groups
  fgs <- atlantisom::load_fgs(dir=d.name,
                              file_fgs = functional.groups.file)
  #Get just the names of active functional groups
  fgs.names <- fgs %>%
    dplyr::filter(IsTurnedOn == 1) %>%
    dplyr::select(Name) %>%
    .$Name
  
  # should return all model areas
  boxpars <- atlantisom::load_box(d.name, box.file)
  boxall <- c(0:(boxpars$nbox - 1))
  
  # generalized timesteps all models
  runpar <- atlantisom::load_runprm(d.name, run.prm.file)
  noutsteps <- runpar$tstop/runpar$outputstep
  stepperyr <- if(runpar$outputstepunit=="days") 365/runpar$toutinc
  timeall <- c(0:noutsteps)
  
  # learned the hard way this can be different from ecosystem outputs
  fstepperyr <- if(runpar$outputstepunit=="days") 365/runpar$toutfinc
  
  # load the biomass index results
  atBtxt <- atlantisom::load_bioind(d.name, bioind.file, fgs)
  
  #load the catch results
  atCtxt <- atlantisom::load_catch(d.name, catch.file, fgs)
  
  # load YOY
  YOY <- atlantisom::load_yoy(d.name, paste0(scenario.name, "YOY.txt"))
  
  # load biol_prm
  biol <- atlantisom::load_biolprm(d.name, biol.prm.file)
  
  
  diaglist <-list("fgs" = fgs,
                  "fgs.names" = fgs.names,
                  "atBtxt" = atBtxt,
                  "atCtxt" = atCtxt,
                  "runpar" = runpar,
                  "noutsteps" = noutsteps,
                  "stepperyr" = stepperyr,
                  "timeall" = timeall,
                  "fstepperyr" = fstepperyr,
                  "boxpars" = boxpars,
                  "boxall" = boxall,
                  "YOY" = YOY,
                  "biol" = biol)
  
  return(diaglist)
  
}

Usage: this produces the list object testdiag to be used in subsequent functions.

# clear the local model output directory for safety if needed
#f <- list.files(d.name, include.dirs = F, full.names = T, recursive = T)
#file.remove(f)

testdiag <- diag_init("DiagnosticConfig.R")
## Using an auto-discovered, cached token.
## To suppress this message, modify your code or options to clearly consent to the use of a cached token.
## See gargle's "Non-interactive auth" vignette for more details:
## https://gargle.r-lib.org/articles/non-interactive-auth.html
## The googledrive package is using a cached token for sarah.gaichas@noaa.gov.

Now we define one function for each diagnostic.

Persistence Test

This should be run on an unfished, unperturbed run. Fishing or perturbations may legitimately drive groups extinct. However, for our historical period, we don’t expect anything to go extinct that is in the system now, so this test includes historical fishing, physics, and phytoplankton drivers.

This test assumes that initial (Time = 0) biomass does not count, and averages over a year to see if mean B is 0. Output is a list of species that have crashed and an optional set of plots where 0 biomass is red. The function diag_persist is called with the atBtxt and fgs.names output of the initialization function. Plots can be turned off with plot=FALSE added to the function call.

diag_persist <- function(atBtxt, fgs.names, plot=TRUE){
  
  # need in annual units? Or fail when any output timestep below threshold?
  # make safe for migratory species, assume that over the course of the year mean B > 0.
  # assumes biomass never goes negative in atlantis
  
  crash <- atBtxt %>%
    filter(species %in% fgs.names) %>%
    mutate(yr = ceiling(time/365)) %>%
    filter(yr > 0) %>%
    group_by(species, yr) %>%
    summarise(meanB = mean(atoutput)) %>%
    filter(meanB == 0)
  
  # flag any groups with any mean annual biomass below a threshold
  
  crashed <- crash %>%
    group_by(species) %>%
    count()
  
  names(crashed) <- c("Failed", "nYrs0B")
  
  # visualize; hardcoded pages for ~89 group NEUS model
  
  if(plot){
    
    atBtxt$col <- cut(atBtxt$atoutput,
               breaks = c(-Inf, 0, Inf),
               labels = c("crashed", ">0 B"))
    
    plotB <-ggplot() +
      geom_line(data=atBtxt%>%filter(species %in% fgs.names), 
                aes(x=time/365,y=atoutput, color=col),
                alpha = 10/10) +
      ggthemes::theme_tufte() +
      theme(legend.position = "top") +
      labs(colour=g.name)
    
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 1, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 2, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 3, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 4, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 5, scales="free"))
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 6, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 7, scales="free")) 
    print(plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 8, scales="free")) 
  }
  
  return(as.data.frame(crashed))
  
}

Usage, with output:

diag_persist(testdiag$atBtxt, testdiag$fgs.names)

##           Failed nYrs0B
## 1        Carrion     55
## 2 Drums_Croakers      1
## 3       Menhaden     45
## 4 Summerflounder      8

Stability test

This test determines whether the model reaches steady state for last ~20 years of an unperturbed, unfished ~100 year run. I don’t have a 100 year run right now so we will test with the 55 year run and look at the last 20 years, with the understanding that the test should use a different model run.

This test evaluates the last 20 years of the input run for a significant slope using the geom_gls function from ecodata. The function diag_stable also checks for persistence and flags groups that have gone to 0 biomass–they don’t count as stable. It takes the same inputs as diag_persist with the addition of the runpar object, and defaults to plotting, which can be turned off with plot=FALSE in the function call.

diag_stable <- function(atBtxt, fgs.names, runpar, plot=TRUE){
  
  # look for non-significant slope over last 20 years? 30 years would be better
  nlast <- 20
  
  startlast <- floor(runpar$nyears)-nlast
  
  #warning, was getting different behavior with this code on my linux machine
  
  stable <- atBtxt%>%filter(species %in% fgs.names) %>%
    filter(time %in% seq(startlast*365, floor(runpar$nyears)*365, by=365)) %>%
    group_by(species) %>%
    ggplot(aes(x=time/365, y=atoutput)) +
    ggthemes::theme_tufte() +
    geom_line(data=atBtxt%>%filter(species %in% fgs.names), 
              aes(x=time/365, y=atoutput),
                alpha = 5/10) +
    geom_gls(warn = FALSE)
  
  
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 1, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 2, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 3, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 4, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 5, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 6, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 7, scales="free"))
  print(stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 3, page = 8, scales="free"))
  
  # what I need is to extract the output of geom_gls that is NULL (no significant trend)
  
  # this may work eventually but blows up R right now, fix later
  # # get results from ecotrend package
  # stabletest <- atBtxt %>%
  #   filter(species %in% fgs.names) %>%
  #   #mutate(yr = ceiling(time/365)) %>%
  #   filter(time %in% seq(startlast*365, floor(runpar$nyears)*365, by=365)) %>%
  #   group_by(species) %>%
  #   dplyr::do(ecotrend::glsMs(data = ., formula = time ~ atoutput))
  # 
   #ecotrend::glsMs(data = ., formula = yr ~ biomass)
   
   #ecotrend::glsMs(yr ~ biomass, stabletest)
  
  #test this run for persistence too
  
  # flag any groups with any mean annual biomass below a threshold

  
}

Usage:

diag_stable(testdiag$atBtxt, testdiag$fgs.names, testdiag$runpar)
## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

## Warning: Computation failed in `stat_gls()`:
## computed "gls" fit is singular, rank 1

Old below here, archive only

# This has now been moved to the DiagnosticConfig.R file and uses atlantisdrive

# set up files to be read in
# the below can go into a config file to be sourced

# input files directly from atlantis-neus repo
functional.groups.file <- "neus_groups.csv"  
biomass.pools.file <- "neus_init.nc"
box.file <- "neus_tmerc_RM2.bgm"
initial.conditions.file <- "neus_init.nc"
fisheries.file <- "neus_fisheries.csv"
biol.prm.file <- "at_biology.prm"

# outputs from gdrive
# location of files, have user pass to diag_init function
g.name <- "https://drive.google.com/drive/folders/1CYLbZlrbjRnXQODeQKb5xUGo6Ry3DC23" # Master_20201020 gdrive folder name 

#g.name <- "https://drive.google.com/drive/folders/1CECkhH80QqGQiHRKG-L7SW1v27VckhWD" # Output_11_13_2020_Rob

#get scenario name from a file inside folder: SSB is unique
scenario.name <- gsub("SSB.txt","",(drive_ls(path=g.name, pattern = "*SSB.txt")$name))

d.name <- here("diagnostics", "temp")
# need to download needed files to a temp folder because atlantisom functions dont read gdrive directly

run.prm.file <- drive_ls(path=g.name, pattern = "*at_run*.xml")$name
bioind.file <- paste0(scenario.name, "BiomIndx.txt")
catch.file <- paste0(scenario.name, "Catch.txt")

# use Andy's atlantisdrive functions instead, wrote this before 
drive_download(file = drive_ls(path=g.name, pattern = "*at_run*.xml"),
                        path = here("diagnostics", "temp", run.prm.file),
                        type = NULL,
                        overwrite = TRUE,
                        verbose = TRUE
)

drive_download(file = drive_ls(path=g.name, pattern = paste0(scenario.name, "BiomIndx.txt")),
                        path = here("diagnostics", "temp", bioind.file),
                        type = NULL,
                        overwrite = TRUE,
                        verbose = TRUE
)

drive_download(file = drive_ls(path=g.name, pattern = paste0(scenario.name, "Catch.txt")),
                        path = here("diagnostics", "temp", catch.file),
                        type = NULL,
                        overwrite = TRUE,
                        verbose = TRUE
)

Get functional group names for other functions

#Load functional groups
funct.groups <- load_fgs(dir=here("currentVersion"),
                         file_fgs = functional.groups.file)
#Get just the names of active functional groups
funct.group.names <- funct.groups %>% 
  filter(IsTurnedOn == 1) %>%
  select(Name) %>%
  .$Name

Get basic output parameters

# should return all model areas
boxpars <- load_box(here("currentVersion"), box.file)
boxall <- c(0:(boxpars$nbox - 1))

# generalized timesteps all models
runpar <- load_runprm(d.name, run.prm.file)
noutsteps <- runpar$tstop/runpar$outputstep
stepperyr <- if(runpar$outputstepunit=="days") 365/runpar$toutinc
midptyr <- round(median(seq(0,stepperyr)))

# a survey that takes place once per year mid year
annualmidyear <- seq(midptyr, noutsteps, stepperyr)

timeall <- c(0:noutsteps)

# learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(runpar$outputstepunit=="days") 365/runpar$toutfinc

Persistence test

This should be run on an unfished, unperturbed run. Fishing or perturbations may legitimately drive groups extinct.

# read the output atlantis biom.txt file

atBtxt <- load_bioind(d.name, bioind.file, funct.groups)

# visualize; hardcoded pages for ~89 group model

plotB <-ggplot() +
  geom_line(data=atBtxt, aes(x=time/365,y=atoutput, color="txt output B"),
             alpha = 10/10) + 
  ggthemes::theme_tufte() +
  theme(legend.position = "top") +
  labs(colour=scenario.name)

plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 1, scales="free") 
plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 2, scales="free") 
plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 3, scales="free") 
plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 4, scales="free") 
plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 5, scales="free")
plotB + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 6, scales="free") 

# need in annual units? Or fail when any output timestep below threshold?
# make safe for migratory species, assume that over the course of the year mean B > 0.
# assumes biomass never goes negative in atlantis

crash <- atBtxt %>%
  group_by(species, time) %>%
  summarise(meanB = mean(atoutput)) %>%
  filter(meanB < 1e-4)

# flag any groups with any mean annual biomass below a threshold

unique(crash$species)

Stability test

Model reaches steady state for last ~20 years of an unperturbed, unfished ~100 year run. Things at 0 shouldn’t count, so run persistence test as well. For plotting, let’s skip the first 20 model years so we only look after spinup.

scenario.name2 <- "atneus_v15_test2008hydro_20180208"
funct.groups <- testdiag$fgs

# read in long run from separate folder
longatBtxt <- read.table(file.path(here("diagnostics", "testfiles", "100_year_no_fishing"), paste0(scenario.name2, "BiomIndx.txt")), header=T)
groupslookup <- funct.groups %>% #assumes we have the same functional groups in this run as the shorter ones
  filter(IsTurnedOn > 0)

# read in run pars for the long run too
longrunpar <- load_runprm(here("diagnostics", "testfiles", "100_year_no_fishing"), run.prm.file2) #assumes same name as above, it is
longnoutsteps <- longrunpar$tstop/longrunpar$outputstep

longtimeall <- c(0:longnoutsteps)

longatBtxttidy <- longatBtxt %>%
  select(Time:DIN) %>%
  rename_(.dots=with(groupslookup, setNames(as.list(as.character(Code)), Name))) %>%
  gather(species, biomass, -Time) %>%
  mutate(yr = ceiling(Time/365))  %>%
  filter(Time %in% seq(20*365, floor(longrunpar$nyears)*365, by=365)) 


# look for non-significant slope over last 20 years? 30 years would be better
nlast <- 20

startlast <- floor(longrunpar$nyears)-nlast

#warning, was getting different behavior with this code on my linux machine

stable <- longatBtxttidy %>%
  filter(Time %in% seq(startlast*365, floor(longrunpar$nyears)*365, by=365)) %>%
  group_by(species) %>%
  ggplot(aes(x=yr, y=biomass)) +
  ggthemes::theme_tufte() +
  geom_line(data=longatBtxttidy, aes(x=yr, y=biomass)) +
  geom_gls(warn = FALSE)


stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 1, scales="free")
stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 2, scales="free")
stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 3, scales="free")
stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 4, scales="free")
stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 5, scales="free")
stable + ggforce::facet_wrap_paginate(~species, ncol=4, nrow = 4, page = 6, scales="free")

# what I need is to extract the output of geom_gls that is NULL (no significant trend)

# this may work eventually but blows up R right now, fix later
# # get results from ecotrend package
# stabletest <- longatBtxttidy %>%
#   filter(Time %in% seq(startlast*365, floor(longrunpar$nyears)*365, by=365)) %>%
#   group_by(species) %>%
#   dplyr::do(ecotrend::glsMs(data = ., formula = yr ~ biomass))
# 
# #ecotrend::glsMs(data = ., formula = yr ~ biomass)
# 
# #ecotrend::glsMs(yr ~ biomass, stabletest)

#test this run for persistence too

longcrash <- longatBtxt %>%
  select(Time:DIN) %>%
  rename_(.dots=with(groupslookup, setNames(as.list(as.character(Code)), Name))) %>%
  gather(species, biomass, -Time) %>%
  mutate(yr = ceiling(Time/365))  %>%
  group_by(species, yr) %>%
  summarise(meanB = mean(biomass)) %>%
  filter(meanB < 1e-4)

# flag any groups with any mean annual biomass below a threshold

unique(longcrash$species)

Establish hindcast period

See discussion here and here.

The period is defined as 1980-2010 for trend comparison. Change it here to see different results:

hindcast <- c(1980:2010)

Data sources include ecodata and stock assessment outputs for major species.

Interim step: which species to include and what data are in the reference set for comparison

Not all species need to match, we will first determine the most important subset(s). Based on discussion, we want to track the species the comprise 80% of biomass (and 80% of catch and 80% of revenue).

Code to evaluate which species are most important based on biomass, catch, and revenue, union and intersection.

# first past B, catch, revenue, use ecodata

survbio <- ecodata::nefsc_survey_disaggregated %>%
  filter(Time %in% hindcast) %>%
  group_by(comname) %>%
  summarize(avgkgtow = mean(kg.per.tow, na.rm = T)) %>%
  mutate(prop = avgkgtow/sum(avgkgtow, na.rm = T)) %>%
  arrange(desc(prop))

hibiosp <- stringr::str_to_sentence(survbio$comname[cumsum(survbio$prop)<0.81 & !is.na(cumsum(survbio$prop))])

# comdat and bennet are already aggregated in ecodata

# need comland, should be able to run similar code as above to get catch and revenue
# NOTE: THIS FILE NOT TO BE POSTED ON GITHUB DUE TO POTENTIAL CONFIDENTIALITY CONCERNS
# Use the comlandr package to create the comland data set or ask Sean for it
comland <- get(load(here::here("diagnostics", "comland_meatwt_deflated_stat_areas.RData")))

# from https://github.com/NOAA-EDAB/Atlantis-Catch-Files/blob/master/Atlantis_1_5_groups_svspp_nespp3.csv
spcodes <- as.data.frame(readr::read_csv(here::here("diagnostics","Atlantis_1_5_groups_svspp_nespp3.csv")))

# time series in case we want to plot them
comlandts <- merge(comland, spcodes, by = "NESPP3") %>%
  filter(YEAR %in% hindcast) %>%
  group_by(YEAR, Name) %>%
  summarise_at(vars(SPPLIVMT, SPPVALUE), sum) 

comlandat <- comlandts %>%
  group_by(Name) %>%
  summarize_at(vars(SPPLIVMT, SPPVALUE), mean, na.rm = T) %>%
  rename(avSPPLIVMT = SPPLIVMT, avSPPVALUE = SPPVALUE) %>%
  mutate(propcatch = avSPPLIVMT/sum(avSPPLIVMT, na.rm = T)) %>%
  mutate(propvalue = avSPPVALUE/sum(avSPPVALUE, na.rm = T)) %>%
  arrange(desc(propcatch))

hicatchsp <- comlandat$Name[cumsum(comlandat$propcatch) <0.81 & !is.na(cumsum(comlandat$propcatch))]

# danger, has to be sorted to be correct
comlandatv <- arrange(comlandat, desc(propvalue))
  
hivalsp <- comlandatv$Name[cumsum(comlandatv$propvalue) <0.81 & !is.na(cumsum(comlandatv$propvalue))]
  
# add recreational catch and rec value/n participants?

# WARNING our names dont match between survbio comnames and comland Name, fix here otherwise have duplicates
# union all 
hibiocatvalsp <- sort(unique(c(hibiosp, hicatchsp, hivalsp)))

Species comprising 80% of survey kg per tow averaged over 1980 to 2010:

Spiny dogfish, Winter skate, Acadian redfish, Haddock, Atlantic cod, Little skate, Atlantic herring, Longfin squid, Silver hake, Smooth dogfish, Butterfish, Pollock, Weakfish, Winter flounder

Species comprising 80% of commercial landings averaged over the same period:

Menhaden, Atlantic herring, Lobster, Atlantic surf clam, Blue crab, Atlantic cod, Ocean quahog, Goosefish, Sea scallop, Atlantic mackerel, Loligo squid, Silver hake, Illex squid

Species comprising 80% of commercial value averaged over the same period:

Lobster, Sea scallop, Blue crab, Atlantic cod, Atlantic surf clam, Atlantic salmon, Summer flounder, Goosefish, Ocean quahog, Menhaden, Loligo squid, Yellowtail flounder, Winter flounder, Haddock

A list with all of these species:

Acadian redfish, Atlantic cod, Atlantic herring, Atlantic mackerel, Atlantic salmon, Atlantic surf clam, Blue crab, Butterfish, Goosefish, Haddock, Illex squid, Little skate, Lobster, Loligo squid, Longfin squid, Menhaden, Ocean quahog, Pollock, Sea scallop, Silver hake, Smooth dogfish, Spiny dogfish, Summer flounder, Weakfish, Winter flounder, Winter skate, Yellowtail flounder

Code to develop the hindcast comparison (call it “reference”) dataset:

# simplest is annual output comparison, but which season should be compared to Atlantis? 

# WARNING our names dont match between survbio comnames and comland Name, fix above

survbio.ts <- ecodata::nefsc_survey_disaggregated %>%
  filter(Time %in% hindcast) %>%
  mutate(comname = stringr::str_to_sentence(comname)) %>%
  filter(comname %in% hibiocatvalsp) %>%
  group_by(Time, Season, comname) %>%
  summarise(annkgtow = sum(kg.per.tow))

tsplot <- ggplot(survbio.ts, aes(Time, annkgtow, color=Season)) + geom_point() + ggthemes::theme_tufte() + facet_wrap(~comname, scales = "free")

grid.draw(shift_legend(tsplot))
## Warning: Removed 512 rows containing missing values (geom_point).

# ecosystem indicators for comparison: total survey biomass
survbio.tot <- ecodata::nefsc_survey_disaggregated %>%
  filter(Time %in% hindcast) %>%
  group_by(Time, Season) %>%
  summarise(annkgtow = sum(kg.per.tow, na.rm = TRUE))

ggplot(survbio.tot, aes(Time, annkgtow)) + geom_point() + ggthemes::theme_tufte() + facet_wrap(~Season)

References

1. Kaplan IC, Marshall KN. A guinea pig’s tale: Learning to review end-to-end marine ecosystem models for management applications. ICES Journal of Marine Science. 2016;73: 1715–1724. doi:10.1093/icesjms/fsw047

2. Pethybridge HR, Weijerman M, Perrymann H, Audzijonyte A, Porobic J, McGregor V, et al. Calibrating process-based marine ecosystem models: An example case using Atlantis. Ecological Modelling. 2019;412: 108822. doi:10.1016/j.ecolmodel.2019.108822